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ABSTRACT 

We report on a broader evaluation of statistical bootstrap resampling meth- 
ods as a tool for pixel-level calibration and imaging fidelity assessment in radio 
interferometry. Pixel-level imaging fidelity assessment is a challenging problem, 
important for the value it holds in robust scientific interpretation of interfero- 
metric images, enhancement of automated pipeline reduction systems needed to 
broaden the user community for these instruments, and understanding leading- 
edge direction-dependent calibration and imaging challenges for future telescopes 
such as the Square Kilometer Array This new computational approach is now 
possible because of advances in statistical resampling for data with long-range 
dependence and the available performance of contemporary high-performance 
computing resources. We expand our earlier numerical evaluation to span a 
broader domain subset in simulated image fidelity and source brightness distri- 
bution morphologies. As before, we evaluate the statistical performance of the 
bootstrap resampling methods against direct Monte Carlo simulation. We find 
both model-based and subsample bootstrap methods to continue to show sig- 
nificant promise for the challenging problem of interferometric imaging fidelity 
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assessment, when evaluated over the broader domain subset. We report on their 
measured statistical performance and guidelines for their use and application in 
practice. We also examine the performance of the underlying polarization self- 
calibration algorithm used in this study over a range of parallactic angle coverage. 

Subject headings: techniques: image processing — methods: statistical — tech- 
niques: interferometric — techniques: polarimetric 



Introduction 



Radio-interferometric image formation requires a solution for both the source bright- 
ness distribution over the image field and the interferometric array instrumental and sig- 
nal propagation effects, estimated jointly from measurements of the electric vector spatial 
coherence function of the incident radiation field measured on each interferometer base- 



line ([Thompson. Moran. fc Swensonl l200ll . and references therein). The coherence data are 
sparsely sampled, leadin g to an ill-posed inverse imaging pr oblem that requires regularization 
for convergent solution (ICornwell. Braun. fc Briggd Il999l ). This regularization is typically 
imposed as a constraint on the properties of the source brightness dist ributions during de- 



convolution, includin g positivity and compact support (|H ogbo mlll974), or via information 



or entropy measures (iNarayan fc Nityanandal 1 1984 : ICornwell &: Evans 



1985|). 



The fidelity of the resulting source brightness distribution cannot be readily estimated 
given the analytic intractability of the coupled, non-linear calibration and imaging equation, 
combined with the fact that the parent probability distribution of the measured spatial 
coherence function is parametrized by the source brightness distribution and the instrumental 
array calibration, both unknown a prior at the time of observation. Instead, achieved image 
quality is typically estimated heuristically (jEkerslll986l ) or by approximate global measures. 
Common metrics include the ratio of the image brightness root-mean-square (rms) measured 
in regions of low brightness, c //, to the thermal noise limit, a t h, as calculated for the 
array from the known antenna sensitivities and receiver and system thermal noise levels, 
assuming idealized obser vations of an unresolved point source with perfect array calibration 
( iWrobel &: Walkerlll999l ). Other measures include the achieved dynamic range d r (which is 
not a measure of image fi delity per se ), expressed as the ratio of the peak brightness, I pea k to 
the off-source rms, ( Per ley 1986 ). or the use of the deepest negative in a total intensity 



T °ff 



image ( where Stokes / > 0) to derive a scaling relation to reduce the typical under-estimation 
of a oS f fcemballlll993h . 



These gross measures of image quality are however compromised by their idealized 
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underlying assumptions, including that of implied direction-independence of image fidelity 
across the field. In practice image fidelity is not constant across the field. It is direction- 
dependent (equivalently, pixel- dependent) due to residual, unmodeled instrumental calibra- 
tion errors jointly or separately in the visibility- or image-plane, and the interaction of these 
residual errors with non-linear deconvolution effects, amongst other factors. As a result, the 
assessment of pixel-dependent image fidelity is a general problem in radio interferometry. 
It is not confined to the important problem of calibrating direct i on-de pendent instrumental 
errors specifically, as described for example by iBhatnagar et al.l (120081 ). 



Although a challenging and largely unsolved problem, understanding pixel-level radio- 
interferometric imaging fidelity is essential for current and future telescope arrays. Current 
instruments need to be made more accessible to the larger astronomical community; in partic- 
ular their user base needs to expand beyond those who have invested the substantial time and 
effort required to acquire expertise in effective radio-interferometri c calibration and imaging 
data r eduction processes (heuristics as summarized for example by lPerley. Schwab, fc Bridle 
( 119891 )). This is best achieved through the provision of automated pipeline reduction sys- 
tems; these require associated estimates of image fidelity to allow effective, broad-based 
community scientific interpretation and analysis. 

For leading-edge future arrays, such as the Square Kilometer Array (SKA0), solving 
the problem of quantitative pixel-level image fidelity assessment is key to their effective 
design. Any interferometer that cannot reach its target thermal noise limit in a representative 
integration needed to meet its key science goals is dynamic-range limited. This is important 
for many interferometers but is particularly acute for the SKA given its h igh sensitivity. 



As a result, the SKA has stringent imaging dynamic range requirements (ISchilizzi et al. 
20071 ). particularly in continuum observing modes where d r ~ 10 6 will be required routinely 
across wide image fields in rapid survey modes and d r ~ 10 7 for individual, targeted fields. 
This dynamic range is not achieved routinely by contemporary radio-interferometric arrays 
except for the innermost pixels in a handful of images, and then only after extensive custom 
reduction by the most skilled radio interferometry practitioners. With increasing projected 
radial distance from the field center the dynamic range may typically decline by several orders 
of magnitude, for reasons noted above. Contemporary examples of high dynamic-range and 
high-sensiti vity observat i ons in radio interferometry, and th eir as sociated chal l enges , are 
provided by iGeller et al.l (120001 ) , Ide Bruyn &: Brent j end (120051 ) , and iNorris et al.l (120051 ) . 



Exponential advances in currently available and anticipated high-performance comput- 
ing (HPC) capabilities and resources (jBaderl 120081 ) allow fundamentally new approaches 



1 http:/ /www. skatelescope.org 
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to the problem of pixel- level radio-interferometric imaging fidelity assessment. Two comple- 
mentary approaches have rec ently been reported: a frequentist statistical resampling method 



( Kemball &: Martinseldl2005l . Paper I) and a Bayesian imaging technique (ISutton fe Wandelt 
20061 ). In Paper I, we described the first application of statistical bootstrap resampling tech- 



niques to the problem of interferometric imaging fi delity assessment. Statistical resampling 
is an active area in contemporary statistics research (jEfronll2003l ; Davison. Hinklev. &: Young 
2003 ), an d gener al reviews are provided in r e cent m o nograp h s by Davison fc Hinklevl ( 1997 \, 
Chernickl fjl999h . IPolitis. Romano, fc Wold Jl999h . lLahiril J2003h . and IZoubir fc Iskander 
r|2004h . 



In the context of future interferometer arrays, such as the SKA discussed above, the sta- 
tistical fidelity assessment methods discussed here are particularly important in understand- 
ing the optimal design of these telescopes and the contributing factors to their dynamic-range 
imaging performance. Specifically, although resampling methods may prove too computa- 
tionally expensive for real-time calibration and imaging at a telescope as computationally 
demanding as the SKA, they are very valuable tools during SKA design and development. 
These methods can also be used in off-line analysis of data from interferometer arrays with 
smaller numbers of elements and lower associated computational costs for calibration and 
imaging. 

We note that resampling techniques as described here are not strongly sensitive to the 
specific calibration and imaging algorithm in use; their goal is instead to provide a measure 
of the statistical properties of the underlying calibration and imaging estimator. Our initial 
evaluation in Paper I considered fidelity assessment of polarization calibration for Very Long 
Baseline Interferometry (VLBI) arrays, a repr esentative radi o-interferometric calibration and 
imaging problem of interest in its own right (IKemballl 119991 ). The statistical performance of 
both model-based and subsample bootstrap resampling was evaluated by inter-comparison 
of the bootstrap results with those obtained by direct Monte Carlo simulation for a single 
fixed two-component polarized source model and array configuration. Our initial study found 
both bootstrap resampling techniques to be computationally tractable with modern HPC 
resources and to have good statistical performance for image variance estimation for the 
single source model and array configuration considered in that initial study. 

In the current paper we describe results from broader evaluation of the applicability of 
statistical resampling to radio-interferometric imaging fidelity assessment. We extend the 
scope of our original evaluation in Paper I by expanding the problem domain along two 
axes, namely source model structure and array parallactic angle coverage. Both parameters 
vary the expected degree of systematic error in the polarimetric calibration and imaging 
estimator used in the study and therefore the pixel-level morphology and magnitude of the 
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image fidelity distribution over which the bootstrap resampling methods can be evaluated. 

In the expanded evaluation reported here, we find here that the model-based and sub- 
sample bootstrap resampling methods for data with long-range statistical dependence, used 
in Paper I, continue to show good statistical performance when applied over a wider range of 
observing configurations and source models. We refine guidelines and rules for their general 
use in radio interferometry and report initial results on their applicability to image bias 
estimation. 

The paper is structured as follows. In Section 2 the theory of bootstrap resampling 
is summarized, as applied to the interferometric calibration and imaging problem under 
examination. Section 3 describes the numerical simulation methods used to measure the 
statistical performance of the bootstrap resampling techniques under evaluation. Simulation 
results are presented in Section 4 and discussed in Section 5. In Section 6, we summarize 
the conclusions from the current work. 



2. Theory 

In this section, we recap the key elements of the statistical resampling method used 
in the current study for fidelity assessment of interferometric polarization calibration and 
imaging. A full description of the theory can be found in Paper I. 



2.1. The imaging equation for radio interferometry 



and generalized for image-plane effects by ICornwelll (119951 ) and iNoordaml (119951 ): 



We adopt the radio-interferomet r ic imaging equation developed bvlHamaker. Bregman. fc Sault 
(|l996l ). ISault. Hamaker. &: Bregman ( Il99d).lrlamaker fc Bregmanl Jl996h a ndlrlamaker! (hood). 



V„, 



n 



where j = v— T, the term V mn is the measured complex 4-vector of polarization cor- 
relations on the baseline b mn between antennas m and n (by common convention referred 
to as visibilities in this discipline), S(p) is the Stokes 4-vector of radio brightness in unit 
direction p, vector p s is the center of the field Q, and K is a constant (4 x 4) matrix that 
maps the Stokes parameters {I, Q, U, V} into the polarization receptor basis (e.g. orthogo- 
nal circular or linear) of the visibility polarization correlations. Direction-independent gains 
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of instrumental type k at antenna m are denoted G^, with associated direction-dependent 
gains = f(p); both are (2 x 2) Jones matrices in the polarization receptor basis, and of 
arbitrary (e.g. pixel or functional) parametrization. The outer matrix product is denoted 
by £g>, and complex conjugation by an asterisk. 



2.2. Polarization self-calibration 



For the interferometric polarization self-calibration and imaging problem considered 
here as a representative problem for the study of imaging fidelity assessment, the imag- 
ing equation [T] contains only direction-independent Jones matrices G 1 ^ = {P m , D m }, i.e. 
T£ is the null set {0} (or equivalently, the T£ are identity matrices), and where P m = 
diag(e _ j am ^\ e +J ' am w) is the Jones matrix containing the feed parallactic angle term a m (t), 
known analytically, and D m is an anti-diagonal matrix containing the antenna-based in- 
strumental polarization leakage terms for each nominally orthogonal polarization receptor 
basis. As described in Paper I, our polarization self- calibration method, as a joint iterative 
solution for D m and S(p), is a subset of general self-calibration within the imaging equa- 
tion framework. For each cycle of (non-progressive) iterative refinement of S(p), we solve 
for the instrumental polarization by minimizing % 2 , formed as the L 2 complex norm at the 
position of the unknown D m in equation [U by integrating the imaging equation from both 
the right-hand and left-hand sides to that position. 



2.3. Imaging fidelity assessment by bootstrap resampling 

Before considering bootstrap resampling as a technique for imaging fidelity assessment 
we need to formulate the problem of radio-interferometric calibration and imaging as a sta- 
tistical inferenc e problem. General statistical inference and estimation in signal processing 



is described by [KayJ ( 119931 ) . The measured visibility dataset, = V mn +J\f, V(m, n, t), 
can be considered a single realization of a vector over time t of random variables V drawn 
from a joint, multi-variate parent probability density function (PDF) of parametric form 
p(V; S(p), D m ,a t h)- Independent, identically-distributed (IID) thermal noise in each visibil- 
ity measurement is denoted by M. We consider the joint self-calibration solvers for D m and 
S(p) as statistical point estimators, denoted D m and S(p) respectively. In this standard 
statistical inference framework, the problem of imaging fidelity assessment is that of deter- 
mining the sampling distribution Pgr* of S(p). As described in Paper I, this problem is 
neither tractable analytically, nor is p(V; S(p), D m . a th ) (or therefore ^sfp)) known a priori. 
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Statistical resampling offers an alternative inference method for ^s(p) ^ na ^ does not re- 
quire analytic tractability or detailed knowledge of p(V; S(p), D mi a t h). These methods are 
computationally expensive however, but contemporary advances in HPC capabilities now 
make them viable approaches. The single realization of V, comprising the observed visibil- 
ity dataset, can be used to construct an empirical distribution function p(V; S(p), D m ,a t h)- 
Resamples V* drawn from p(V; S(p), D m ,a t h), conditional on the observed data V^f, and 
under statistical conditions where the bootstrap is applicable, mirror the statistical rela- 
tionship between V and the unknown parent distribution p(V; S(p), D m , a t h)- The imaging 
estimator S(p), is here chosen to represent the estimator for the restored image. Acting 
on the resampled visibility datasets, this estimator yields images S*; their statistical distri- 
bution relative to S* yo , which is obtained from the S(p) acting on the observed (template) 
realization, provides a bootstrap estimate of ^s{p) anc ^ hence an assessment of imaging fi- 
delity by our current definition. Here the subscript xy denotes the use of a pixel basis for 
the images. 

Radio-interferometric data have long-range s tatistical dep endence and thus require boot- 



strap innovations developed for dependent data (lLahiril 120031 ). In Paper I, we demonstrated 
the successful use of two such methods, namely the model-based and subsample bootstraps, 
for a single simulated test case in polarization self-calibration and imaging. We present an 
expanded evaluation in the following sections. 



3. Simulation methods 
3.1. Run codes 

The primary goal of this study is to evaluate the statistical performance of the boot- 
strap resampling techniques advanced in Paper I over a larger domain subset. We expand the 
previous study along two axes, namely the expected level of image fidelity and the range of 
source brightness distribution morphologies considered. We retain the same interferometric 
polarization and imaging problem used in Paper I as a representative test problem in this 
discipline. The details of our polarization self-calibration heuristic is described in detail in 
Paper I; broadly summarized, this is a joint solver for S(p) and D m over ten non-progressive 
self-calibration iterations, starting from an unpolarized unit-brightness point source bright- 
ness distribution and zero instrumental polarization for each antenna in the array. In the 
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current work, we also retain the the simulated Very Long Baseline Array (VLBA@) configu- 
ration and instrumental polarization terms enumerated in Table 1 of Paper I. 

To vary the expected image fidelity, we adjust the range of parallactic angle for the 
simulated array configuration by truncating the simulated observation duration. The VLBA 

is comprised of antennas with azimuth-elevation antenna mounts whic h, unlike equatorial 

moun ts, produce non-zero, time- variable feed parallactic angle variations ({Thompson. Moran. fc Swenson 
200ll ) . These parallactic angle ranges are labeled as run codes A through F, in order of de- 
creasing parallactic angle range, and are enumerated in Table [TJ A graphical representation 
is provided in Figure [TJ Run code A is the case of nearly-complete parallactic angle cover- 
age considered as the sole case studied in Paper I. Interferometric polarization calibration 
of linearly-polarized calibrators has a higher degree of systematic error for reduced paral- 
lactic angle coverage; this results from the increasing degeneracy introduced between the 
polarization basis functions e ±jam ^- ) and the un known full-polarization source brightness 
distribution S(p) for low parallactic angle range dMorris. Radhakrishnan. fc Seielstad 1964 ; 
Conwav fc KronbereJ Il969l : ICotton et all 1 19841 : iRoberts et al.lll984h . 



To vary source polarization morphology, we decrease the angular separation between 
the two simulated Gaussian components used in Paper I (as parametrized in Table 2 of 
Paper I), so increasing their degree of spatial overlap and hence polarization complexity 
We reproduce the source model in Stokes {/, Q, U, V} at the original component separation 
used in Paper I in Figure [21 and include the quantitative component parameter values in 
the caption to that Figure. The component separations used in the current study of source 
morphology variation are tabulated in Table [2] in this paper as run codes X, Y, and Z. For 
these run codes,however, we retain the same complete parallactic angle range used in Paper 
I, i.e. the same parallactic angle range as run code A in Table [TJ 



3.2. Monte Carlo reference simulations 



For each run code {A-F, X-Z}, we assess the statistical performance of each bootstrap 
resampling method by inter-comparison with results obtained by direct Monte Carlo simula- 
tion. As in Paper I, an ensemble of N s = 256 visibility datasets were generated for each run 
code by direct Monte Carlo simulation. Additive IID thermal noise contributions M were 
drawn as a ph a sor fr om the complex normal distribution CA/"(0, this distribution is 



defined by iKayl (119931 ) . The joint polarization self-calibration and imaging estimator for D Tl 



2 The National Radio Astronomy Observatory is a facility of the National Science Foundation operated 
under agreement by Associated Universities, Inc. 
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and S(p), was applied to each visibility dataset realization within the Monte Carlo ensemble 
generated for each run code. The bias and mean-squared error (MSE) of the sampling distri- 
bution Pgr* of S(p) were then computed for each restored ensemble image, in a pixel basis 
S xy , relative to the true source model brightness distribution S x y used to generate the simu- 
lated data, the latter convolved with a common median restoring beam matched separately 
to the spatial resolution of each run code. The variance was computed as iV" 1 Yl(^xy) ~ ^xy j 
where S xy is the sample mean across the ensemble, and the MSE per pixel relative to the 

M 2 

true model was computed as A^ 1 ^ (Sxy — S x y) . Each statistic was computed per image 
pixel, per self-calibration iteration number, and per Stokes parameter {/, Q, U, V}. These 
Monte Carlo estimates of the variance, bias, mean, and MSE of the sampling distribution 
Fs(p) °f the imaging estimator S(p) provide an estimate of truth against which the statistical 
performance of the bootstrap resampling methods can be assessed. 



3.3. Bootstrap simulations 

Both the model-based and subsample bootstrap methods described in Paper I were 
used in the current study. These bootstrap methods are applicable to data with long- 
term statistical dependence, as is the case in radio interferometry. The parameters for each 
bootstrap method, both the model-based (M1-M4) and subsample (S1-S4) bootstrap are 
reproduced here from Paper I as Table [3] and Table H] respectively. The reader is referred to 
Paper I for specific implementation details for these bootstrap methods. For each run code 
{A-F, X-Z}, a bootstrap ensemble (of size 256 realizations) was generated for each separate 
bootstrap method {M1-M4, S1-S4} by resampling from a single template realization (here 
chosen to be either the 127th or 128th of N s = 256) taken from the matching Monte Carlo 
ensemble for that run code. For each resulting bootstrap ensemble, the bias and MSE of 
the sampling distribution of the imaging estimator, S(p), were computed from the restored 
images obtained across the ensemble, S* y , relative to the restored image, S* yo , obtained for 
the template visibility dataset realization. This technique can be used directly with real 
data obtained by physical observation, unlike Monte Carlo simulation, which presupposes 
knowledge of the unknown true source model. 

The primary purpose of this study is to assess if a single observed visibility dataset 
realization can be used, through resampling, to estimate imaging fidelity, quantified here 
as the variance, bias, and MSE of the sampling distribution ^F§(g\ of a regularized imaging 
estimator S(p). We assess the statistical performance of each bootstrap method in the 
only practical way, namely against the sampling distribution properties obtained by direct 
Monte Carlo simulation. Conclusions regarding the statistical applicability of bootstrap 
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resampling for imaging fidelity obtained in this way translate directly to real observations if 
the general validity of the approach can be assessed numerically over a sufficiently complete 
domain subset. As part of that more complete evaluation, for each run code {A-F, X-Z}, 
we have completed a Monte Carlo simulation (denoted by code MC) and a set of resampling 
simulations for bootstrap codes {M1-M4,S1-S4}. 



3.4. HPC implementation 

As in Paper I, a modified version of the AIPS+H-B package was used to perform po- 
larization self-calibration and imaging, but for the work described in this paper, com- 
piled within the build framework for radio-interferometric analysis software described by 



Kemball. Crutcher. fc Hasan! (120081 ). In addition to the changes to the code base to support 
polarization self-calibration and statistical resampling, we also replaced the default x 2 New- 
ton solver for the imaging equation in the original package with a conjugate- gradient solver 
as implemented in the OptSolve++ packag^l 

The computations were parallelized over visibility dataset realization within each boot- 
strap or Monte Carlo ensemble and run on a 32-bit Intel Xeon Linux cluster operated as a 
national computational resource by the National Center for Supercomputing Applications^. 
The cluster has a peak single-precision floating point performance per processor of approxi- 
mately 12 Gflops. Each run was typically distributed over 64 processors and ran for ~ 3-4 
hours resulting in an integrated compute cost of O(Tflops). We note as before, however, 
that the problem is highly parallelizable and scalable over realization and has good load 
balancing characteristics; only the template realization and final statistical processing need 
to be performed in serial segments of the code. 



4. Simulation results 

The performance of the polarization self-calibration algorithm in direct Monte Carlo 
simulations over run codes {A-F} in shown in Figure Et these run codes have parallactic 
angle ranges depicted in Figure [TJ Here the MSE of the instrumental polarization estimator 
D m , averaged over antenna and receptor polarization, with reference to the known truth used 



3 http: / / aips2.nrao.edu 

4 OptSolve++ is distributed by Tech-X corporation (http://www.techxhome.com 
5 www. ncsa.uiuc.edu 



in the simulations, is separately plotted for each run code against self-calibration iteration 
number. The detailed computation of the MSE for D m is described in the caption to Figure[3j 



As described in preceding sections, the primary purpose of the current work is to assess 
the statistical performance of the bootstrap resampling techniques advanced in Paper I, over 
an expanded domain subset in this discipline. This has been achieved by increasing the 
range of imaging fidelity and source morphologies considered, enumerated by run codes {A- 
F, X-Z} in Tables [T]and [2J For each run code, a direct Monte Carlo simulation (MC) was 
performed and bootstrap resampling methods {M1-M4, S1-S4} subsequently applied to a 
single visibility dataset drawn from each Monte Carlo ensemble. 

For a qualitative assessment of the statistical performance of the bootstrap methods 
relative to the Monte Carlo reference, we plot in Figures H] through [7] the pixel-based Stokes 
Q variance images for the final iteration of polarization self-calibration for a subset of run 
codes, chosen here to be {B,D,X,Z}; these run codes are representative of typical bootstrap 
statistical performance observed. We choose Stokes Q here as both Stokes {Q,U} show 
residual calibration errors most clearly for the underlying polarization calibration algorithm. 
In each of these figures, the reference Monte Carlo variance image is shown in the upper 
left corner; the remaining images, in sequential order, are the variance images obtained 
by bootstrap resampling methods {M1-M4, S1-S4}. A default color mapping was used in 
generating the raster images in these figures; for each image the same color palette was 
mapped to the range of pixel data values in the image variance cube resulting from each 
run; each cube has axes (x,y), Stokes parameter {/, Q, U, V}, and self-calibration iteration 
number. The variance images contain contributions from both calibration and deconvolution 
errors, as discussed in the introduction. 

For a quantitative assessment of bootstrap statistical performance we use a similar 
goodness-of-fit statistic vmse, to that defined in Paper I to compare the variance image 
obtained using a given bootstrap resampling method, var xy , to the the reference variance 
image obtained by Monte Carlo simulation, var^ y c . In the current work, the statistic is 
computed over the inner quarter of the image, all Stokes parameters {/, Q,U, V}, and all 
self-calibration iteration numbers k — 1 — 10, as: 





were Vf is the variance scaling factor (IDavison fc Hinkleyl 119971 ). 

The value of Vf was determined numerically for each bootstrap run in a Newton mini- 
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mization of the goodness-of-fit statistic vmse, as defined in equation [21 The resulting fitted 
values of Vf are listed in Table and the associated minima in vmse given in Table O 
both cross-tabulated by run code {A-F, X-Z} and bootstrap code {M1-M4, S1-S4}. The 
values of Vmse i n Tableware normalized per row by the minimum value obtained across 
bootstrap codes {M1-M4, S1-S4} for each run code {A-F, X-Z}. The minimum vmse is 
given in the right-most column of the table. The optimal bootstrap methods are therefore 
indicated in this table by unit relative vmse- Note that the values of vmse given in Table 
5 of Paper I were incorrectly scaled in absolute magnitude and need to be multiplied by a 
factor 1.502 x 10~ 5 for comparison with the vmse values in the current work. 

In Figures [8] through [10] we plot the pixel-based Stokes Q variance for the final polar- 
ization self-calibration iteration for each run code {A-F, X-Z} for both the direct Monte 
Carlo reference simulation and the optimal model-based and subsample bootstrap methods 
identified in Table [61 In these figures, the Stokes Q variance images are drawn as contour 
plots with identical fractional contour levels in each figure sub-panel. 



5. Discussion 

The results presented in the previous section show that the applicability of statistical 
resampling as a tool for imaging fidelity assessment in radio interferometry is upheld from 
our initial evaluation presented in Paper I, when explored over a larger domain subset, here a 
greater range of expected image fidelity and source brightness distribution morphology. The 
current work continues to support the conclusion that contemporary st atistical resa mpling; 



techniques developed for data with long-range statistical dependence ( jLahiril 120031 ) are a 
promising method for estimating the statistical properties of regularized calibration and 
imaging estimators used for the solution of the imaging equation [1] in radio interferometry. 
The technique also intrinsically allows an estimate of image quality per pixel, in contrast to 
coarser direction-independent metrics, such as those based on off-source rms a a ff for example. 
As a bootstrap technique, it is also not sensitive to assumptions about the functional form 
or parametrization of the parent PDF of the measured visibility data, and does not require 
the frequent idealization of uniform Gaussianity for the total noise contribution. 

As described in the preceding section, we assess each bootstrap method quantitively 
against the direct Monte Carlo variance reference image; this goodness-of-fit statistic vmse 
is tabulated in Table [6j An optimal bootstrap method, within a given set, is the method 
that agrees most closely with the Monte Carlo result by this quantitative measure. 



Our finding in Paper I that the model-based bootstrap generally strongly outperforms 
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the subsample bootstrap continues to hold true in the broader evaluation described here. 
The statistical performance of the subsample b ootstrap improves with in creasing; subsam- 



ple delete fraction f s , as expected theoretically (iDavison fc Hinkleyl 119971 ). The subsample 
bootstrap performance drops precipitously for low or moderate delete fraction f s < 0.5 and 
the optimal subsample bootstrap code is found to be S4 consistently across all run codes 
{A-F, X-Z} in the current study. The subsample delete fraction for bootstrap code S4 is 
0.75, the highest value of f s amongst the bootstrap codes {S1-S4}. Despite the poorer sta- 
tistical performance of the subsample bootstrap, we note that it has the advantage of ease 
of implementation and requires no free user-adjustable parameters if a fixed value / s = 0.75 
is adopted. 

The optimal model-based bootstrap code is found to be M3 across run codes {A-F} and 
M4 for run codes {X-Z}. The best single bootstrap over all run codes is M3. The model-based 
bootstrap used in this study models the long-term statistical dependence in the data by fitting 
a series of independent segmented polynomial fits (of maximum degree N p ) to the visibility 
time-series on each interferometer baseline over successive solution intervals of length At^. 
This does not require a priori knowledge of the source brightness distribution, however. Full 
details of the implementation of this method can be found in Paper I. Bootstrap codes M3 
and M4 have increasing solution intervals At^ A > At^ 13 (as enumerated in Table [3]). The 
optimality of M4 over M3 for run codes {X-Z} is consistent with the more compact source 
morphology for these run codes, as noted in Table El and therefore a lower mean rate of 
change of source visibility over time. As shown in Table El the relative performance of the 
model-based bootstrap shows a weak dependence on At fc , compared to the strong dependence 
of the subsample bootstrap performance on the delete fraction f s . In this sense the model- 
based bootstrap is also more robust against sub-optimal bootstrap parameter choices. The 
current study confirms that At 6 should be chosen with regard to the expected mean time 
rate of change of the visibility function. This can also in principle be deduced automatically 
from the actual observed time- variability in V^f. The interval Atf, needs to be chosen as the 
longest interval over which the piece-wise polynomial model for the statistical dependence 
in the visibility time-series holds on each interferometer baseline. 

For the subsample bootstrap, the variance scal ing factor Vf is expe c ted to have a power- 



law dependence on subsample delete fraction f s . IDavison fc Hinkleyl (119971 ) quote an ex- 
pected relation Vf ~ (1 — fs)~ a for the related delete-d jacknife method. The fitted scaling 
factors from the run codes {A-F, X-Z} in the current study are plotted against subsample 
delete fraction f s in Figure [L_2J We find the relation to be broadly consistent across the 
different parallactic angle ranges, which have different numbers of visibilities N vis , and the 
source morphologies considered in the current work. We find empirically in the current study 
that a function of the form Vf = foe - "* 1- ^ provides a better fit to the data (either separately 
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for each run code or jointly across all run codes) than vj = 6(1 — f s ) a or Vf = afl + bf s + c, 
although we stress that the current data do not statistically rule out these models. Further 
studies with a finer sampling in f s are planned to constrain this relation more closely. Addi- 
tional work will also be required to determine its broader applicability; this is an empirical 
relation in its current form. A joint best fit across run code {A-F, X-Z} is drawn as a solid 
line in Figure [12] in the form: 



v f (f s ) = AASle- 3 - 287 ^ (3) 

The subset of runs comprising the direct Monte Carlo simulations over parallactic angle 
ranges {A-F} provide an opportunity to study the performance of the polarization self- 
calibration algorithm introduced in Paper I. The total MSE for the estimator D m , averaged 
over antenna and polarization receptor basis, is plotted on a logarithmic scale against self- 
calibration iteration number in Figure [31 As noted, and demonstrated, in Paper I, the 
algorithm is expected, from relative information arguments, to demonstrate rapid conver- 
gence for the case of good parallactic angle coverage. The current study measures the 
algorithm performance for decreasing parallactic angle range over run codes {A-F}. Fig- 
ure [3] shows that there is good convergence for parallactic angle ranges Aa ~ 80 — 140° that 
include source transit, moderate convergence for Aa ~ 30 — 60°, and poor convergence for 
Aa ~ 0°. This is consistent with heuristic rules of thumb for other interferometric polariza- 
tion calibration techniques that recom mend a minimum parallactic angle range Aa > 100° 
( Leppanen. Zensus. fc Diamond! fl~995l ). 



Although the current study has focused on measuring the variance of the sampling distri- 
bution of the imaging estimator S(p), we show initial results on bias estimation in Figure [TTJ 
These example images are selected for their expected high levels of bias, as anticipated in 
the current study for the case of Stokes {Q,U} images at low polarization self-calibration 
iteration numbers, where the instrumental polarization terms will still have high MSE. As 
for the variance study, we assess the statistical performance of the bootstrap resampling 
estimate of bias against the result obtained by direct Monte Carlo simulation. Figure [TT1 
shows that the optimal variance estimators, as identified in Table El show qualified promise 
as bias estimators in the high-bias regime. However, we note that our pre liminary experience 



i s that they have lower statistical performance than variance estimators. lEfron fc Tibshirani 



( 119931 ) discuss the performance of bootstrap bias estimators, and propose techniques that 
can significantly improve their statistical performance. We intend to examine bias estimation 
in this application in more detail in a future paper. 



We also note that new statistical imaging techniques of the type considered in this 
paper allow novel approaches to understanding direction-dependent radio-interferometric 
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calibration and imaging fidelity, especially those challenges posed by wide-field imaging at 
uniform, high dynamic range, as needed for future leading-edge telescopes such as the SKA. 
These techniques offer a new approach, distinct from instrument simulation which has well- 
known limitations, to derive direction-dependent imaging fidelity assessments from single 
observations with real prototype arrays. 

The computational cost of the bootstrap resampling techniques evaluated here is ap- 
proximately ~ O(10 2 ) times the processing cost for a single dataset without resampling. This 
places it well within contemporary HPC resources at the teraflop scale and those projected 
to be available soon in the petascale era (lBaderll2008l ). The cost of the algorithm is strongly 
mitigated by its high degree of scalability and effective parallel decomposition. 



6. Conclusions 

We draw the following conclusions from the current evaluation of bootstrap resampling 
for radio-interferometric imaging fidelity assessment over a broader domain subset: 

1. Statistical resampling techniques for dependent data remain a promising approach to 
the important problem of pixel-level fidelity assessment in radio-interferometric cali- 
bration and imaging. They show good statistical performance as assessed against direct 
Monte Carlo simulation, and are computationally affordable and scalable on contem- 
porary and future HPC resources. They have the desirable statistical properties of 
independence on the detailed functional form or parametrization of the parent PDF of 
the observed visibility data and of not requiring an analytic solution for the calibration 
and imaging inference problem. 

2. Model-based bootstrap techniques outperform subsampling bootstrap methods in gen- 
eral in our study, although both are of value in this application. The subsample boot- 
strap performs well for delete fraction f s ~ 0.75, is straight-forward to implement, and 
has no adjustable parameters if f s is adopted as this value. The model-based bootstrap 
has superior statistical performance in general. It has free parameters (N p , Atb) that 
can in principle be deduced from the observed data and against which the method is 
relatively robust to the exact values used. 

3. The subsample bootstrap variance scaling relation Vf = f(f s ) measured in the current 
study shows relatively limited variability across the broader domain subset considered 
here. The current data are consistent with an exponential function: Vf = be^ a ^^ s \ 
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4. The polarization self-calibration algorithm introduced in Paper I is found to converge 
at a faster than exponential rate for good parallactic coverage Aa. Its performance 
for lower parallactic angle ranges is found to deteriorate for Aa < 100° and to break 
down in the limit as Aa — > as expected. 

5. An initial examination of statistical resampling for bias estimation shows qualified 
promise in the high-bias regime in this discipline, but further evaluation is required. 
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Table 1. Parallactic angle ranges and run codes 





|Aqi| 


|Aq 2 | 


|Aa| 


N vis 


Run code 


(deg) 


(deg) 


(deg) 




A 


155 


123 


139 


222750 


B 


155 


67 


111 


182250 


C 


155 


4 


79.5 


141750 


D 


121 





60.5 


101250 


E 


59 





29.5 


60750 


F 


3 





1.5 


20250 



Note. — Aai and A«2 are the parallactic an- 
gle ranges at the central antenna (Pie Town) before 
and after source transit. The mean value is denoted 
as Act. The number of visibilities in each simulated 
dataset is listed under column N v i 3 . 



Table 2. Simulation source model component separations 





Aa 


A8 


Run code 


(mas) 


(mas) 


A 


-0.5 


+0.1 


X 


-0.25 


+0.05 


Y 


-0.125 


+0.025 


Z 


-0.0625 


+0.0125 



Table 3. Model-based bootstrap run parameters 



Code 


max(ATp) 


Atj, (sec) 


Ml 


1 


60 


M2 


1 


150 


M3 


1 


300 


M4 


2 


900 



Note. — max(JVp) is the maxi- 
mum degree of the model polyno- 
mial used in each bootstrap model 
interval Atf,. 
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Table 4. Subsample bootstrap run parameters 



Code 


u 


SI 


0.125 


S2 


0.25 


S3 


0.5 


S4 


0.75 



Note. — f s is 
the fraction of 
data deleted in 
each subsample 
realization 



Table 5. Bootstrap performance: variance scale factor Vf 



Run cod' 


a Ml 


M2 


M3 


M4 


SI 


S2 


S3 


S4 


A 


1.00 


1.00 


0.99 


1.02 


0.16 


0.33 


0.77 


1.93 


B 


0.95 


0.99 


1.01 


1.03 


0.19 


0.34 


0.78 


1.94 


C 


0.97 


1.02 


1.00 


1.03 


0.20 


0.38 


0.91 


2.05 


o 


0.98 


0.98 


0.98 


0.99 


0.19 


0.43 


0.91 


1.90 


E 


0.96 


0.98 


0.95 


1.01 


0.28 


0.52 


1.16 


2.37 


F 


0.91 


0.94 


1.03 


0.98 


0.29 


0.48 


1.02 


1.70 


X 


0.97 


0.98 


1.00 


1.00 


0.20 


0.37 


0.80 


1.94 


Y 


0.96 


1.00 


1.01 


1.02 


0.19 


0.37 


0.79 


1.92 


Z 


0.97 


1.00 


1.02 


1.02 


0.20 


0.37 


0.79 


1.94 



Note. — The bootstrap performance measure Vf is defined in the main 
text. 
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Table 6. Bootstrap performance: variance mean-squared error vmse 





Ml 


M2 


M3 


M4 


SI 


S2 


S3 


S4 


m'm(vMSE) 


Run code 


















xlO" 15 


A 


1.42 


1.05 


1.00 


1.07 


3.18 


2.53 


1.73 


1.52 


3.40 


B 


1.36 


1.12 


1.00 


1.14 


5.81 


3.87 


2.14 


1.64 


4.73 


C 


1.40 


1.17 


1.00 


1.15 


5.10 


3.65 


2.32 


1.73 


6.67 


D 


1.53 


1.18 


1.00 


1.13 


5.51 


5.49 


3.48 


2.73 


16.4 


E 


1.39 


1.27 


1.00 


1.41 


5.43 


5.09 


4.05 


2.03 


20.0 


F 


2.30 


1.47 


1.00 


2.28 


14.8 


12.3 


10.7 


7.94 


985 


X 


1.28 


1.10 


1.02 


1.00 


4.90 


3.70 


1.72 


1.22 


3.39 


Y 


1.45 


1.35 


1.17 


1.00 


5.40 


4.49 


2.16 


1.51 


2.91 


Z 


1.42 


1.23 


1.14 


1.00 


5.93 


4.27 


2.05 


1.38 


2.97 



Note. — The right-most column is the minimum vmse measured across bootstrap 
type for each run code. The values for bootstrap codes {M1-M4, S1-S4} in each row 
are scaled by this minimum value, to allow the relative performance of the different 
bootstrap methods for a given run code to be more easily compared. 
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Fig. 1. — Parallactic angle variation at the central antenna (Pie Town) for the data used 
in the Monte Carlo simulations for run codes A-F, as enumerated in Table [U Vertical lines 
denote the schedule truncation for each run code, as used to vary the parallactic angle range 
in the simulated data. 
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Fig. 2.— Simulation source model, convolved with a circular restoring 

beam of 156.007 fias, plotted in Stokes {I,Q,U,V}, using contour levels of 
{-64, -32, -16, -8, -4, -2, -1, 1, 2, 4, 8, 16, 32, 64} x 4.434 x HT 3 Jy per 
beam. The peak brightness in Stokes {J, Q, U, V} is 4.434 x 10~\ 8.867 x 10~ 2 , 9.591 x 10~ 2 , 
and —3.836 x 10~ 2 Jy per beam respectively. The source model consists of two polar- 
ized Gaussian components, denoted here by subscripts one and two, with parameters: 



(Ae*i = 0, A5i = 0), b 



maj\ 



0.2 mas, b 



mini 



0.15 mas, P.A.i = 30 deg, I x = 1 Jy, Qj 



0.2 Jy, Ui = 0.1 Jy, Vi = Jy, (Aa 2 = —0.5, A5 2 = +0.1) mas, b ma j 2 = 0.3 mas, b miri2 = 
0.1 mas, P.A.2 = -10 deg, I 2 = 1 Jy, Q 2 = Jy, U 2 = 0.25 Jy, V 2 = -0.1 Jy. This 
Figure is reproduced from Figure 1 and Table 2 in Paper I. 
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Fig. 3. — Total mean-squared error in the off-diagonal D m matrix elements, plotted for each 
run code {A-F} against polarization self-calibration iteration number. As noted in Paper I, 
the total calibration MSE is computed here as: ^2 pe m L \ ^((^, — ^m)(^m — ^m) > where 
dP m is the instrumental polarization determined by the solver, and d p m is the true value used 
to generate the simulated data. The decreasing parallactic angle ranges for run codes {A-F} 
are listed in Table [TJ 
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Fig. 4. — The imaging estimator Stokes Q variance for the final iteration of polarization self- 
calibration for run code B, as measured by direct Monte Carlo simulation (MC), model-based 
bootstrap resampling (M1-M4), and subsample resampling (S1-S4) methods(m sequential 
order from top left to bottom right). The parameters for these bootstrap resampling codes 
are listed in Table [3] and Table H] respectively. The run code parameters are given in Table [I] 
and Table [2j The same default color mapping is used for each bootstrap image variance 
cube, where increasing color brightness denotes increasing image variance. This figure is 
included to allow morphological comparison of bootstrap variance images; a quantitative 
assessment of bootstrap performance is given in Table [6j See captions to Figures [BJ [H and 
[TOl for quantitative variance ranges used in the associated contour plots. 




Fig. 5. — Same as Figure HI but for run code D. 




Fig. 6. — Same as Figure [5], but for run code X. 




Fig. 7. — Same as Figure [BJ but for run code Z. 
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Fig. 8. — The imaging estimator Stokes Q variance for the final polarization self-calibration 
iteration obtained by direct Monte Carlo simulation (MC; left), and for the optimum model- 
based (M3; center column) and subsample (S4; right-most column) bootstrap methods iden- 
tified in Table [6] for run codes A, B, and C (from top to bottom). The parameters for 
these bootstrap resampling codes are listed in Table [3] and Table H] respectively. The run 
code parameters are given in Table [U and Table [21 The contour levels are plotted at levels 
{0, 0.1, 0.2,. ..,1.0} of the peak variance values: (A-MC: 1.883, A-M3: 1.783, A-S4: 1.924, 
B-MC: 2.534, B-M3: 2.429, B-S4: 3.089, C-MC: 3.697, C-M3: 3.450, C-S4: 5.580) xlO" 6 
(Jy/beam) 2 . 
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Fig. 9. — Same as Figure EJ but for run codes D, E, and F (from top to bottom), The contour 
levels are plotted at levels {0, 0.1, 0.2,. ..,1.0} of the peak variance values: (D-MC: 12.74, 
D-M3: 13.36, D-S4: 18.22, E-MC: 6.839, E-M3: 7.048, E-S4: 13.74, F-MC: 44.37, F-M3: 
51.70, E-S4: 114.1) xl0~ 6 (Jy/beam) 2 . 
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Fig. 10. — Same as Figure EJ but for run codes X, Y, and Z (from top to bottom), The 
contour levels are plotted at levels {0, 0.1, 0.2,.. .,1.0} of the peak variance values: (X-MC: 
1.451, X-M4: 1.372, X-S4: 2.415, Y-MC: 1.504, Y-M4: 1.438, Y-S4: 2.012, Z-MC: 1.507, 
Z-M4: 1.650, Z-S4: 2.351) xKT 6 (Jy/beam) 2 . 
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Fig. 11. — The imaging estimator Stokes Q bias for the first iteration of polarization self- 
calibration for run codes X (top) and Z (bottom), as measured by direct Monte Carlo simu- 
lation (MC; left), model-based bootstrap resampling (M4; center), and subsample bootstrap 
resampling (S4; right) methods. The bootstrap codes are chosen for their optimality, as 
indicated in Table El The same default color mapping is used for each bootstrap image bias 
cube, where increasing color brightness denotes increasing image bias. 




Fig. 12. — The variance scaling factor, Vf, plotted against the subsampling delete fraction, 
f s , obtained from subsample bootstrap runs {S1-S4} for run codes {A-F, X-Z}. The best 
joint exponential fit is plotted as a solid line, with parameters as described in the main text. 



